Search CORE

8 research outputs found

On the evaluation of exact-match and range queries over multidimensional data in distributed hash tables

Author: Malensek Matthew
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2012
Field of study

2012 Fall.Includes bibliographical references.The quantity and precision of geospatial and time series observational data being collected has increased alongside the steady expansion of processing and storage capabilities in modern computing hardware. The storage requirements for this information are vastly greater than the capabilities of a single computer, and are primarily met in a distributed manner. However, distributed solutions often impose strict constraints on retrieval semantics. In this thesis, we investigate the factors that influence storage and retrieval operations on large datasets in a cloud setting, and propose a lightweight data partitioning and indexing scheme to facilitate these operations. Our solution provides expressive retrieval support through range-based and exact-match queries and can be applied over massive quantities of multidimensional data. We provide benchmarks to illustrate the relative advantage of using our solution over a general-purpose cloud storage engine in a distributed network of heterogeneous computing resources

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

Author: Malensek Matthew
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2017
Field of study

2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Polygon-based query evaluation over geospatial data using distributed hash tables

Author: Matthew Malensek
Sangmi Pallickara
Shrideep Pallickara
Publication venue
Publication date: 01/01/2013
Field of study

Abstract-Data volumes in the geosciences and related domains have grown significantly as sensing equipment designed to continuously gather readings and produce data streams for geographic regions have proliferated. The storage requirements imposed by these datasets vastly outstrip the capabilities of a single computing resource, leading to the use and development of distributed storage frameworks composed of commodity hardware. In this paper, we explore the challenges associated with supporting geospatial retrievals constrained by arbitrary polygonal bounds on a distributed hash table architecture. Our solution involves novel distribution and partitioning of these voluminous datasets, thus enabling the use of a lightweight, distributed spatial indexing structure, the geoavailability grid. Geoavailability grids provide global, coarse-grained representations of the spatial information stored within these ever-expanding datasets, allowing the search space of distributed queries to be reduced by eliminating storage resources that do not hold relevant information. This results in improved response times and more effective utilization of available resources. Geoavailability grids are also applicable in non-distributed settings for local lookup functionality, performing competitively with other leading spatial indexing technology

CiteSeerX

Polygon-Based Query Evaluation over Geospatial Data Using Distributed Hash Tables

Author: Matthew Malensek
Sangmi Pallickara
Shrideep Pallickara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Abstract—Data volumes in the geosciences and related domains have grown significantly as sensing equipment designed to contin-uously gather readings and produce data streams for geographic regions have proliferated. The storage requirements imposed by these datasets vastly outstrip the capabilities of a single comput-ing resource, leading to the use and development of distributed storage frameworks composed of commodity hardware. In this paper, we explore the challenges associated with sup-porting geospatial retrievals constrained by arbitrary polygonal bounds on a distributed hash table architecture. Our solution involves novel distribution and partitioning of these voluminous datasets, thus enabling the use of a lightweight, distributed spatial indexing structure, the geoavailability grid. Geoavailability grids provide global, coarse-grained representations of the spatial information stored within these ever-expanding datasets, allowing the search space of distributed queries to be reduced by eliminat-ing storage resources that do not hold relevant information. This results in improved response times and more effective utilization of available resources. Geoavailability grids are also applicable in non-distributed settings for local lookup functionality, performing competitively with other leading spatial indexing technology

CiteSeerX

Crossref

Analytic Queries over Geospatial Time-Series Data Using Distributed Hash Tables

Author: Matthew Malensek
Sangmi Pallickara
Shrideep Pallickara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Synopsis: A Distributed Sketch over Voluminous Spatiotemporal Observational Streams

Author: Matthew Malensek
Sangmi Lee Pallickara
Shrideep Pallickara
Thilina Buddhika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations

Author: Bishop
Breiman
Dean
Domingos
F. Jay Breidt
Foster
Fox
Fujimoto
Harvey
Hastie
Isard
Malensek
Malensek
Malensek
Matthew Malensek
McKay
Mendes-Moreira
Misra
Neil Harvey
Pallickara
Pallickara
Pendell
Sangmi Pallickara
Schapire
Shrideep Pallickara
Sui
Tanenbaum
Wackerly
Walid Budgaga
Wason
White
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Autonomous Orchestration of Distributed Discrete Event Simulations in the Presence of Resource Uncertainty

Author: Bialecki A.
Eklof M.
Green C.
Jefferson D.
Lee G.
Matthew Malensek
Neil Harvey
Portacci K.
Ramírez Ortiz J. L.
Shrideep Pallickara
Zhiquan Sui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref